The hexscape package has a number of uses (some of which are not yet implemented):
Facilitate use of vectorised spatial data for EU/EEA countries, and/or regions/areas (as defined by NUTS1/2/3) of these countries
Facilitate use of additional vectorised spatial data on parishes, kommune and postcodes for Denmark
Subdivision of the above spatial data into `patches’ based on either polygons of arbitrary size, or Voronoi tessellation around points of interest supplied by the user
Facilitate use of Corine land usage data for EU/EEA countries (or regions/areas thereof) and aggregation of different land cover types within the spatial areas and/or patches defined above
Facilitate generating networks of distances and/or adjacencies for these polygons
Facilitating sampling of random points based on landscapes and/or real points provided
The data comes from a combination of Eurostat, Corine, and other sources - all of which is freely available online. Much of the internal functionality is provided by the sf package.
The package isn’t on CRAN (yet?), but the most recent stable (ish) version can be installed from our drat repository using:
install.packages("hexscape", repos=c(CRAN = "https://cran.rstudio.com/",
`ku-awdc` = "https://ku-awdc.github.io/drat/"))
Or, for the most recent (but even less stable) version you can install from GitHub:
# remotes::install_github("ku-awdc/hexscape")
Then you should be able to load the package:
library("hexscape")
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Note that this also currently loads tidyverse, as that is what I used to develop the package, and I was too lazy to import everything properly. At some point, this will change, so it is probably best to explicitly load tidyverse and sf (due to the heavy reliance on sf data frames) as well:
library("tidyverse")
library("sf")
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
(Unless of course you don’t use tidyverse, in which case you should probably read https://r4ds.had.co.nz/index.html a couple more times)
The hexscape package uses local caching of intermediate data files, as most of the spatial operations take quite a long time to process from the raw data. So, the first step is to create a folder somewhere on your hard drive (NOT a network storage drive) to/from which files can be saved/loaded:
path_to_folder <- "/some/path/to/a/folder"
Then run the following code:
set_storage_folder(path_to_folder)
You should now see that hexscape has created some subfolders:
list.files(path_to_folder)
## [1] "eurostat_cache" "landscapes" "processed_data" "raw_data"
This storage folder needs to be set every time you load hexscape. To avoid having to do this manually, you can put the following code in your .Rprofile:
## To open your .Rprofile try:
usethis::edit_r_profile()
## Then add the following to the bottom of that file:
Sys.setenv("HEXSCAPE_STORAGE"="/some/path/to/a/folder")
# [obviously replacing /some/path/to/a/folder with your path]
That way, the storage folder will be set for you automatically when hexscape is loaded.
The hexscape package makes heavy use of NUTS (Nomenclature of territorial units for statistics; 2021 version) to define spatial areas. These are defined at levels 0 (national) to 3 (smallest scale), with the number of NUTS units depending on the country. Some datasets are inbuilt to help with cross-referencing:
nuts_codes
## # A tibble: 2,010 × 5
## Code Country Level NUTS Label
## <chr> <chr> <int> <chr> <chr>
## 1 AL Albania 0 AL Shqipëria
## 2 AT Austria 0 AT Österreich
## 3 BE Belgium 0 BE Belgique/België
## 4 BG Bulgaria 0 BG България
## 5 CH Switzerland 0 CH Schweiz/Suisse/Svizzera
## 6 CY Cyprus 0 CY Κύπρος
## 7 CZ Czechia 0 CZ Česko
## 8 DE Germany 0 DE Deutschland
## 9 DK Denmark 0 DK Danmark
## 10 EE Estonia 0 EE Eesti
## # ℹ 2,000 more rows
nuts_codes |> filter(Code=="DK")
## # A tibble: 18 × 5
## Code Country Level NUTS Label
## <chr> <chr> <int> <chr> <chr>
## 1 DK Denmark 0 DK Danmark
## 2 DK Denmark 1 DK0 Danmark
## 3 DK Denmark 2 DK01 Hovedstaden
## 4 DK Denmark 2 DK02 Sjælland
## 5 DK Denmark 2 DK03 Syddanmark
## 6 DK Denmark 2 DK04 Midtjylland
## 7 DK Denmark 2 DK05 Nordjylland
## 8 DK Denmark 3 DK011 Byen København
## 9 DK Denmark 3 DK012 Københavns omegn
## 10 DK Denmark 3 DK013 Nordsjælland
## 11 DK Denmark 3 DK014 Bornholm
## 12 DK Denmark 3 DK021 Østsjælland
## 13 DK Denmark 3 DK022 Vest- og Sydsjælland
## 14 DK Denmark 3 DK031 Fyn
## 15 DK Denmark 3 DK032 Sydjylland
## 16 DK Denmark 3 DK041 Vestjylland
## 17 DK Denmark 3 DK042 Østjylland
## 18 DK Denmark 3 DK050 Nordjylland
There is also a list of countries indicating whether or not that country has Eurostat spatial information available (and can therefore be used with the hexscape package):
country_codes
## # A tibble: 42 × 6
## Code Country Eurostat Native French German
## <chr> <chr> <lgl> <chr> <chr> <chr>
## 1 AL Albania TRUE Shqipëria Albanie Alban…
## 2 AT Austria TRUE Österreich Autriche Öster…
## 3 BA Bosnia and Herzegovina FALSE Bosna i Hercegovina Bosnie … Bosni…
## 4 BE Belgium TRUE Belgique/België Belgique Belgi…
## 5 BG Bulgaria TRUE Bulgarija Bulgarie Bulga…
## 6 CH Switzerland TRUE Schweiz/Suisse/Svizzera Suisse Schwe…
## 7 CY Cyprus TRUE Kýpros Chypre Zypern
## 8 CZ Czechia TRUE Česko Tchéquie Tsche…
## 9 DE Germany TRUE Deutschland Allemag… Deuts…
## 10 DK Denmark TRUE Danmark Danemark Dänem…
## # ℹ 32 more rows
We use the spatial data provided by Eurostat to define the boundaries of NUTS areas. This can be downloaded from: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts#nuts21
You need to select the following options:
Then hit the download button and extract the resulting ZIP file. You should end up with a 29.2 MB folder called “NUTS_RG_01M_2021_3035.shp”. Put this folder inside the “raw_data” folder of the hexscape storage folder you created above.
You should then be able to load a map for any NUTS area at any level, for example:
dk <- load_map("DK")
dk
## Simple feature collection with 1 feature and 5 fields
## Active geometry column: geometry
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 4199561 ymin: 3496605 xmax: 4650503 ymax: 3850132
## Projected CRS: ETRS89-extended / LAEA Europe
## # A tibble: 1 × 7
## Code Country NUTS Level Label centroid
## * <chr> <chr> <chr> <int> <chr> <POINT [m]>
## 1 DK Denmark DK 0 Danmark (4278397 3656722)
## # ℹ 1 more variable: geometry <MULTIPOLYGON [m]>
ggplot(dk) + geom_sf() + theme_void()
Or for a sub-region of a country:
sj <- load_map("DK032")
ggplot(sj) + geom_sf() + theme_void()
The first time the maps for a specific country are fetched might take a couple of seconds to process from the raw data; subsequent loads will be from a country-specific cache (almost instantaneously).
The hexscape package uses Corine Land Cover (CLC) data as the raw underlying data for land usage. To use this, you need to download the SQLite Database from here: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018?tab=download
You will first need to register an account and log in, but it is free to do so.
Once the file has downloaded (beware: it is 3.5GB) you can un-zip it and hopefully end up with a folder named u2018_clc2018_v2020_20u1_geoPackage - move (or copy/symlink if you prefer) this folder inside the raw_data folder that you set up above. Make sure to keep the name of the folder exactly as it is above.
The CLC data classifies land usage under a number of different categories, grouped at 3 different levels. To see what codes are defined use the following lookup table:
clc_codes
## # A tibble: 44 × 5
## CLC CLC_Label1 CLC_Label2 CLC_Label3 CLC_RGB
## <chr> <chr> <chr> <chr> <chr>
## 1 111 Artificial surfaces Urban fabric Continuou… 230-00…
## 2 112 Artificial surfaces Urban fabric Discontin… 255-00…
## 3 121 Artificial surfaces Industrial, commercial and tran… Industria… 204-07…
## 4 122 Artificial surfaces Industrial, commercial and tran… Road and … 204-00…
## 5 123 Artificial surfaces Industrial, commercial and tran… Port areas 230-20…
## 6 124 Artificial surfaces Industrial, commercial and tran… Airports 230-20…
## 7 131 Artificial surfaces Mine, dump and construction sit… Mineral e… 166-00…
## 8 132 Artificial surfaces Mine, dump and construction sit… Dump sites 166-07…
## 9 133 Artificial surfaces Mine, dump and construction sit… Construct… 255-07…
## 10 141 Artificial surfaces Artificial, non-agricultural ve… Green urb… 255-16…
## # ℹ 34 more rows
The final column defines a suitable colour code suggested for plotting.
The raw Corine data contains data on all polygons for all countries (i.e. it is vector rather than raster). In order to use it, we need to process the raw data to intersect with the specific NUTS1 area(s) we are interested in, and simplify the polygons slightly (for easier processing and plotting downstream). This is done automatically the first time you request a map, then the data is cached and re-used for subsequent requests of the same NUTS1 area. Simplification of polygons from different CLC codes is done ensuring that the total area remains the same, i.e. none of the simplified CLC polygons overlap, and the returned result is still a valid sf object. This all takes some time to process… For example, caching data for Denmark (NUTS1 code DK0) takes around 3 mins to run on my arm64 laptop (and more than double that on my ageing Xeon desktop):
dk_corine <- load_corine("DK0")
But subsequent requests for any data from within Denmark is instantaneous, for example:
dk032_corine <- load_corine("DK032")
## Returning corine data (by CLC and NUTS3 area)
ggplot(dk032_corine, aes(fill=CLC)) + geom_sf(lwd=0, colour=NA) + theme_void() + theme(legend.pos="none") + scale_fill_manual(values=clc_codes$CLC_RGB)
## TODO: implement an autoplot method
The default return value is a modified sf data frame, with 1 row per combination of CLC and NUTS3 area code in the region requested, and columns giving the CLC code and descriptions, country code and name, NUTS3 code, total area of the CLC type in the original data (in km2), total area of the CLC type in the simplified data, and the geometry. If you prefer a single row per CLC type for the entire area, then you can specify union=TRUE to load_corine().
The number of NUTS1 codes per country varies from 1 (around half of the countries) up to 16 (Germany), and each NUTS1 area is cached separately. If you request data for a whole country, this might mean it takes a long time to process all of the relevant NUTS1 areas.
If you know that you need to process a lot of different NUTS1 areas then you can set use_cache=TRUE for load_corine(); this does some pre-processing of the entire raw Corine database which takes around 5 minutes the first time, but speeds up subsequent calls to load_corine (for any NUTS1 area). The downside is that this uses around 5GB of extra hard drive space to store the cached intermediate files. Or you can ask me nicely for a specific set NUTS1 areas and I will send the pre-processed cache files to you … but please do register and download the Corine database yourself so that I am not violating their usage policies.
The load_corine() function provides spatial data for every CLC code separately, but you may want to filter and/or combine them. For now you have to do that manually - for example if we want to only extract farmland:
dk032_corine |>
filter(CLC_Label1 == "Agricultural areas") |>
summarise(Area = sum(Area), geometry = st_union(geometry)) ->
dk032_farmland
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green") +
theme_void()
TODO: At some point I will probably write a helper function e.g. aggregate_corine() for this…
TODO: incorporate code for kommune, sogne, and postcode shape files from Covid19tools package
Future TODO: maybe include other info (e.g. population densities) from Danmark’s Statistic via their API (or using https://github.com/rOpenGov/dkstat ??)
One option to discretise the spatial data is using Voronoi tesselation around a given number of spatial points. In real life this could be farm locations; let’s simulate N=100 farms within the farmland of south Jutland:
N <- 100L
tibble(Index = 1:N, point=st_sample(dk032_farmland, N)) |>
st_as_sf(sf_column_name="point") ->
fake_farms
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(data=fake_farms, col="dark green") +
theme_void()
Then we can produce Voronoi tesselated polygons within this farmland as follows (where the first argument is the spatial mask for the area of interest, and the second is the points defining the Voronoi points):
dk032_voronoi <- discretise_voronoi(dk032_farmland, fake_farms)
This returns the original data frame, plus new columns for Area, centroid (of the Voronoi tesselated area after intersecting with the map), and geometry.
dk032_voronoi
## Simple feature collection with 100 features and 2 fields
## Active geometry column: geometry
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 4204284 ymin: 3521791 xmax: 4325551 ymax: 3650354
## Projected CRS: ETRS89-extended / LAEA Europe
## # A tibble: 100 × 5
## Index point Area centroid
## <int> <POINT [m]> <dbl> <POINT [m]>
## 1 1 (4242713 3622093) 50.4 (4243473 3621299)
## 2 2 (4234044 3596600) 54.9 (4234537 3597826)
## 3 3 (4229850 3576767) 6.71 (4228459 3577344)
## 4 4 (4267775 3570311) 40.6 (4267596 3570780)
## 5 5 (4250561 3543219) 254. (4253547 3541177)
## 6 6 (4235876 3612153) 30.8 (4236080 3614278)
## 7 7 (4295214 3594451) 65.8 (4294117 3594994)
## 8 8 (4291632 3566683) 69.9 (4297428 3568681)
## 9 9 (4248952 3570326) 113. (4248050 3569150)
## 10 10 (4212965 3612382) 55.1 (4213575 3615673)
## # ℹ 90 more rows
## # ℹ 1 more variable: geometry <GEOMETRY [m]>
These can be plotted to show the expected discrepancy between original farm location and centroid e.g.:
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(data=dk032_voronoi, fill="transparent", col="red") +
geom_sf(data=dk032_voronoi, aes(geometry=centroid), col="dark green") +
geom_sf(data=fake_farms, col="red") +
theme_void()
TODO
TODO
TODO
TODO
A number of helper/utility functions are provided (mostly wrapping functionality in sf), some of which are described below.
To easily sample random points within a data frame containing polygons (either Voronoi tesselations or hexagons or anything else), you can use the sample_points function. This takes 2 arguments: the first is an sf data frame with 1 or more rows and a column giving the Index, and the second is the number of points to sample within each row.
random_points <- sample_points(dk032_voronoi, size=5L, verbose=0L)
random_points
## Simple feature collection with 500 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 4208463 ymin: 3528774 xmax: 4323460 ymax: 3649730
## Projected CRS: ETRS89-extended / LAEA Europe
## First 10 features:
## Index geometry
## 1 1 POINT (4248886 3615240)
## 2 1 POINT (4247654 3617746)
## 3 1 POINT (4238852 3624152)
## 4 1 POINT (4240744 3622374)
## 5 1 POINT (4249690 3614803)
## 6 2 POINT (4234743 3595806)
## 7 2 POINT (4236357 3596280)
## 8 2 POINT (4232034 3598753)
## 9 2 POINT (4234313 3602730)
## 10 2 POINT (4233542 3599323)
The points are generated by rejection sampling over a bounding box that iteratively decreases in size once enough points have been found for each individual polygon. To plot the points:
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(data=dk032_voronoi, fill="transparent", col="red") +
geom_sf(data=random_points, col="black", size=0.3) +
theme_void()
In this case, each farmland polygon should contain exactly 5 random points.
TODO: add a buffer argument for sample_points so that we can ensure that each sampled point is a minimum distance from all other sampled points (perhaps as a proportion of the observed minimum distance in the provided points)
A utility function is also provided to randomly re-place each provided point into a new location based on Voronoi tesselation. For example, starting with the fake farms and Corine-based farmland that we had before, we can ask for a new spatial location for each farm somewhere within the nearest 5 Voronoi tesselated spatial areas like so:
random_farms <- randomise_voronoi(dk032_farmland, fake_farms, randomise_size=5L, verbose=0L)
## Getting random points...
The output is the same as the input data frame, except that we have an additional column RandomPoint that contains the new spatial location:
random_farms
## Simple feature collection with 100 features and 1 field
## Active geometry column: point
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 4209788 ymin: 3532695 xmax: 4323950 ymax: 3646843
## Projected CRS: ETRS89-extended / LAEA Europe
## # A tibble: 100 × 3
## Index point RandomPoint
## * <int> <POINT [m]> <POINT [m]>
## 1 1 (4242713 3622093) (4241919 3627043)
## 2 2 (4234044 3596600) (4229756 3602037)
## 3 3 (4229850 3576767) (4236271 3577347)
## 4 4 (4267775 3570311) (4265267 3567122)
## 5 5 (4250561 3543219) (4256934 3556144)
## 6 6 (4235876 3612153) (4247956 3615358)
## 7 7 (4295214 3594451) (4289351 3603199)
## 8 8 (4291632 3566683) (4299946 3564961)
## 9 9 (4248952 3570326) (4249365 3567013)
## 10 10 (4212965 3612382) (4219987 3611053)
## # ℹ 90 more rows
This is done in a way that attempts to preserve farm density by picking from the same number of pre-sampled points per Voronoi tesselation unit (default 10), which reduces the chances of e.g. placing all new points within the same large area surrounding a single farm.
We can see how far each farm has moved using the following code:
random_farms |>
mutate(Change = st_sfc(mapply(function(a,b){st_cast(st_union(a,b),"LINESTRING")}, point, RandomPoint, SIMPLIFY=FALSE), crs = st_crs(point))) ->
random_farms
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(aes(geometry=point), random_farms, col="red") +
geom_sf(aes(geometry=RandomPoint), random_farms, col="green") +
geom_sf(aes(geometry=Change), random_farms) +
theme_void()
Increasing the randomise_size argument and re-running the code should result in farms moving on average further:
randomise_voronoi(dk032_farmland, fake_farms, randomise_size=25L, verbose=0L) |>
mutate(Change = st_sfc(mapply(function(a,b){st_cast(st_union(a,b),"LINESTRING")}, point, RandomPoint, SIMPLIFY=FALSE), crs = st_crs(point))) ->
random_farms
## Getting random points...
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(aes(geometry=point), random_farms, col="red") +
geom_sf(aes(geometry=RandomPoint), random_farms, col="green") +
geom_sf(aes(geometry=Change), random_farms) +
theme_void()
TODO: when adding the buffer argument for sample_points make sure this is passed through from randomise_voronoi as well
The package is still under active development and is subject to some change, however we will try to keep the interfaces discussed above consistent. In particular, we acknowledge that the help files are fairly non-existent at the moment - sorry - but these will be updated ASAP. When new release versions are uploaded to drat, this guide will be updated with the new features.
At some point soon ish we will be summarising the main features of this package as part of a paper (linked to both Mossa’s PhD and DigiVet). In the meantime, comments and suggestions are very welcome!
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sf_1.0-9 hexscape_0.6.2-1 lubridate_1.9.2 forcats_1.0.0
## [5] stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
## [9] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
## [13] pbapply_1.7-0
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.6 sass_0.4.7 jsonlite_1.8.7
## [4] geojsonlint_0.4.0 here_1.0.1 bslib_0.5.0
## [7] RcppParallel_5.1.6 assertthat_0.2.1 countrycode_1.4.0
## [10] highr_0.10 cellranger_1.1.0 yaml_2.3.7
## [13] pillar_1.9.0 backports_1.4.1 glue_1.6.2
## [16] digest_0.6.33 RefManageR_1.4.0 colorspace_2.1-0
## [19] stringfish_0.15.7 htmltools_0.5.5 plyr_1.8.8
## [22] pkgconfig_2.0.3 rmapshaper_0.4.6 bibtex_0.5.1
## [25] httpcode_0.3.0 broom_1.0.3 scales_1.2.1
## [28] RApiSerialize_0.1.2 tzdb_0.4.0 timechange_0.2.0
## [31] proxy_0.4-27 farver_2.1.1 generics_0.1.3
## [34] cachem_1.0.8 withr_2.5.0 cli_3.6.1
## [37] magrittr_2.0.3 readxl_1.4.3 evaluate_0.21
## [40] fansi_1.0.4 xml2_1.3.3 class_7.3-21
## [43] tools_4.2.2 hms_1.1.3 lifecycle_1.0.3
## [46] V8_4.2.2 munsell_0.5.0 compiler_4.2.2
## [49] jquerylib_0.1.4 e1071_1.7-12 qs_0.25.4
## [52] rlang_1.1.1 classInt_0.4-8 units_0.8-1
## [55] grid_4.2.2 rstudioapi_0.14 regions_0.1.8
## [58] rmarkdown_2.23 codetools_0.2-18 gtable_0.3.3
## [61] jsonvalidate_1.3.2 DBI_1.1.3 curl_5.0.1
## [64] R6_2.5.1 knitr_1.43 eurostat_3.7.10
## [67] fastmap_1.1.1 utf8_1.2.3 rprojroot_2.0.3
## [70] KernSmooth_2.23-20 stringi_1.7.12 parallel_4.2.2
## [73] crul_1.3 Rcpp_1.0.11 vctrs_0.6.3
## [76] tidyselect_1.2.0 xfun_0.39